Skip to content

Conversation

@Kerushani
Copy link
Contributor

@Kerushani Kerushani commented Nov 24, 2025

JIRA ticket link

F4KRP-130

Implementation description

  • This PR adds SweepClusteringAlgorithm, a clustering implementation that groups a list of Location objects into clusters around a warehouse coordinate.
  • Computes each location’s polar angle relative to the warehouse using atan2, normalized to [0, 2π) using math.tau.

Steps to test

There's a test file (sweep_algorithm_test.py) - there are variables on Lines 33 - 35 that can be adjusted - feel free to test it out with that for different cluster numbers, locations per cluster, and max boxes per cluster.

The test can be run with docker compose exec backend sh -lc 'python -m app.services.implementations.sweep_algorithm_test'

What should reviewers focus on?

Correctness of sweep ordering implementation

  • Angle computation (atan2(lat_diff, lon_diff))
  • Secondary sort by distance squared

Checklist

  • My PR name is descriptive and in imperative tense
  • My commit messages are descriptive and in imperative tense. My commits are atomic and trivial commits are squashed or fixup'd into non-trivial commits
  • I have requested a review from the PL, as well as other devs who have background knowledge on this PR or who will be building on top of this PR

@Kerushani Kerushani force-pushed the sweep-clustering-algorithm branch from 81b41b4 to fd9c1e2 Compare January 31, 2026 21:08
@Kerushani Kerushani changed the title Sweep clustering Add sweep clustering Feb 4, 2026
@Kerushani Kerushani force-pushed the sweep-clustering-algorithm branch from fd9c1e2 to e303b8d Compare February 7, 2026 22:00
@Kerushani Kerushani marked this pull request as ready for review February 7, 2026 23:53
@claude
Copy link

claude bot commented Feb 7, 2026

Code Review

Summary

The sweep clustering algorithm is a solid start — the polar-angle sort with distance as a tiebreaker is the right approach for the sweep heuristic. A few issues should be addressed before merging.


Critical Issues

1. Protocol signature mismatch

ClusteringAlgorithmProtocol.cluster_locations does not include warehouse_lat / warehouse_lon parameters, but SweepClusteringAlgorithm.cluster_locations adds them. This breaks structural subtyping — the class will not satisfy the protocol at type-check time, and code that dispatches clustering algorithms through the protocol interface will not be able to pass warehouse coordinates.

Either update the protocol to include these parameters (which makes sense since geographic clustering algorithms generally need an anchor point) or use a different approach (e.g., inject coordinates at construction time).

2. TimeoutError shadows the built-in

Defining class TimeoutError(Exception) shadows Python's built-in TimeoutError. The same issue exists in sweep_algorithm.py, but adding a second definition compounds the problem. Consider reusing a shared custom exception (perhaps in a exceptions.py module) or removing the override entirely and just using the built-in TimeoutError.

3. num_clusters is ignored in the clustering logic

The algorithm sorts locations by angle + distance and then greedily fills clusters based solely on the max_locations_per_cluster and max_boxes_per_cluster constraints. The num_clusters parameter has no effect on how many clusters are produced — the resulting number of clusters depends entirely on the constraint limits. If both constraints are None, all locations end up in a single cluster. This diverges significantly from the documented behaviour ("Target number of clusters to create") and from what every other implementation does.

4. max_locations_per_cluster / max_boxes_per_cluster of None causes a TypeError

would_exceed_locations = current_location_count + 1 > max_locations_per_cluster  # TypeError if None

If either constraint is None, comparing against it will raise a TypeError at runtime. Wrap with if max_locations_per_cluster is not None guards, similar to how mock_clustering_algorithm.py handles this.


Significant Issues

5. Pre-flight validation logic is copied verbatim from MockClusteringAlgorithm

The base_cluster_size / remainder / max_cluster_size validation block is identical to the mock. However, since the sweep algorithm doesn't respect num_clusters during distribution (issue #3), this validation becomes misleading — it may pass even when the actual resulting cluster count is wrong, and may reject valid inputs.

6. Return type annotation of helper functions is wrong

calculate_angle_from_warehouse and calculate_distance_squared are annotated -> float | None but they either return a float or raise an exception — they never return None. This will confuse type checkers and mislead readers.

7. sweep_algorithm_test.py should not live in implementations/

Test/script files should live under the tests/ directory (or be removed before merge if this is a development-only exploration script). Having it in implementations/ pollutes the production package and violates the project's test layout (backend/python/tests/). Compare with k_means_test.py — ideally that one should move too.


Minor Issues

8. Imports are not sorted per isort convention

Standard-library imports (import math, import time) appear after the third-party/local imports. Per PEP 8 and the project's linting setup, they should come first.

9. Missing blank lines between class definitions (PEP 8 / ruff)

class LocationLatitudeError(Exception):
    ...
    pass


class LocationLongitudeError(Exception):
    ...
class TimeoutError(Exception):  # ← missing blank line

10. noqa: ARG002 missing on timeout_seconds

MockClusteringAlgorithm uses # noqa: ARG002 for the unused timeout_seconds parameter. SweepClusteringAlgorithm does use it, but only via the inner check_timeout closure — the linter may still flag this depending on how it analyses closures. Worth double-checking.

11. Lambda parameter name shadows outer variable

sorted_locations = sorted(locations_with_angles, key=lambda location: (location[1], location[2]))

The lambda argument location shadows the location variable from the comprehension loop above. Use a different name (e.g., item or entry).

12. sweep_algorithm_test.py uses asyncio.run(main()) but main() uses synchronous SQLModel Session and create_engine

The session is not truly async (it uses the sync Session), yet the function is declared async def. This works but is misleading. Either use asyncio.run with a truly async session or make main() synchronous.

13. Location.latitude is not None filter in test script uses wrong SQLAlchemy syntax

.where(Location.latitude is not None, Location.longitude is not None)

In SQLAlchemy/SQLModel, is not None in Python evaluates to True at the Python level — it does not produce a SQL IS NOT NULL clause. The correct syntax is .where(Location.latitude.is_not(None), Location.longitude.is_not(None)) (or Location.latitude != None with isnot).


Suggestions

  • Consider adding unit tests in backend/python/tests/ that test the sweep clustering with synthetic Location objects, similar to how test_driver_assignment_service.py is structured. Edge cases worth covering: locations at the same angle, exactly num_clusters locations, all locations at the warehouse coordinate, and constraint violations.
  • The docstring still says "Simple mock clustering algorithm" (copied from MockClusteringAlgorithm). Update it to describe the actual sweep approach.

from sqlmodel import Session, create_engine, func, select

from app.models.location import Location
from app.models.route_stop import RouteStop # noqa: F401
@Kerushani Kerushani requested a review from ludavidca February 8, 2026 01:06
Copy link
Collaborator

@ludavidca ludavidca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall Implementation seems solid, but there are still some lint issues! Please feel free to resolve them and request review again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants